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TLB lock and unlock operation 

(57) A digital system and method of operation is pro- 
vided in which several processing resources (340) and 
processors (350) are connected to a shared translation 
lookaside buffer (TLB) (300, 310(n)) of a memory man- 
agement unit (MMU) and thereby access memory and 
devices. These resources can be instruction proces- 
sors, coprocessors, DMA devices, etc. Each entry loca- 
tion in the TLB is filled during the nomriai course of action 
by a set of translated address entries (308, 309) along 
with qualifier fields (301 , 302, 303) that are incorporated 
with each entry. Operations can be perfomned on the 
TLB that are qualified by the various qualifier fields. A 



command (360) is sent by an MMU manager to the con- 
trol circuitry of the TLB (320) during the course of oper- 
ation. Commands are sent as needed to flush (invali- 
date), lock or unlock selected entries within the TLB. 
Each entry in the TLB is accessed (362, 368) and the 
qualifier field specified by the operation command is 
evaluated (364). This can be task ID field 302, resource 
ID field 301 , shared indk:ator 303, or combinations of 
these. Operation commands can also specify a selected 
virtual address entry (305). Each TLB entry is modified 
in response to the command (366) only if its qualifier 
field(s) match the qualifier(s) specified by the operation 
command. 
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Description 

[0001] This application claims priority to European Application Serial No. 00402331 .3, filed August 21 , 2000. 
[0002] This invention generally relates to computer processors, and more specifically to improvements in translation 
5 lookaside buffers for address translation, systems, and methods of making. 

[0003] Microprocessors are general purpose processors which provide high instruction throughputs in order to ex- 
ecute software running thereon, and can have a wide range of processing requirements depending on the particular 
software applications involved. 

[0004] Many different types of processors are known, of which microprocessors are but one example. For example, 

10 Digital Signal Processors (DSPs) are widely used, in particular for specific applications, such as mobile processing 
applications. DSPs are typically configured to optimize the perfomnance of the applications concerned and to achieve 
this they eniploy more specialized execution units and instruction sets. Particularly in applications such as mobile 
telecommunbations, but not exclusively, rt is desirable to provide ever increasing DSP perfomnance while keeping 
power consumption as low as possible. 

15 [0005] To further improve perfomnance of a digital system, two or more processors can be interconnected. For ex- 
ample, a DSP may be interconnected with a general purpose processor in a digital system. The DSP perfomns numeric 
intensive signal processing algorithms while the general purpose processor manages overall control flow. The two 
processors communicate and transfer data for signal processing via shared memory. A direct memory access (DMA) 
controller is often associated with a processor in order to take over the burden of transfening blocks of data from one 

20 memory or peripheral resource to another and to thereby improve the performance of the processor. 

[0006] Modular programming builds a computer program by combining independently executable units of computer 
code (known as modules), and by tying modules together with additional computer code. Features and functionality 
that may not be provided by a single nnodule may be added to a computer program by using additional modules. 
[0007] The design of a computer programming unit known as a task (or function) is often accomplished through 

25 modular programming, where a specific task is comprised of one module and the additional computer code needed to 
complete the task (if any additional code Is needed). However, a task may be defined as broadly as a grouping of 
modules and additional computer codes, or, as nan^owly as a single assembly-type stepwise command. A computer 
program may be processed (also called "run" or "executed") in a variety of manners. One manner is to process the 
computer code sequentially, as the computer code appears on a written page or on a computer screen, one command 

30 at a time. An altemative manner of processing computer code is called task processing, in task processing, a computer 
may process computer code one task at a time, or may process multiple tasks simultaneously. In any event, when 
processing tasks, it is generally beneficial to process tasks in some optimal order. 

[0008] Unfortunately, different tasks take different amounts of time to process. In addition, the result, output, or end 
point of one task may be required before a second task may begin (or complete) processing. Furthermore, particularly 
35 in a multiple processor environment, several tasks may need access to a common resource that has a generally fixed 
capacity. 

[0009] In order to better manage program tasks and physical memory, a concept of virtual memory and physical 
memory has evolved. Program task modules are generally compiled and referenced to virtual address. When a task 
is executed in physfcal memory, address translation is perfomned using a cache of translated addresses, referred to 
40 as a translation lookaside buffer (TLB). TLBs must be managed to optimize system perfonnance as various tasks are 
executed. 

[0010] Accordingly, there is needed a system and method for managing task processing and address translation that 
takes into account active tasks, active resources, and other task processing needs. 

[001 1 ] Particular and pref enred aspects of the invention are set out in the accompanying independent and dependent 
45 claims. In accordance witii a first embodiment of the invention, a method is provided for operating a digital system 
having a processor and associated translation lookaside buffer (TLB). Several pmgram tasks are executed within the 
processor that initiate a sequence of memory access requests in response to the program tasks. In response to the 
sequence of memory access requests, a set of translated memory addresses are cached in the TLB. A task identification 
value is incorporated with each translated memory address to identify whk^h of the program tasks requested the re- 
50 spective translated memory address. 

[0012] A portion of the set of translated memory address In the TLB is locked or unlocked In a manner that Is qualified 
by the task identification value, such that only an entry belonging to a selected program task in the set of translated 
memory addresses is affected. 

[0013] In an embodiment of the invention, the TLB has several levels, and the step of locking or unlocking encom- 
55 passes all of the levels of the TLB. 

[0014] In another embodiment, a selected victim translated memory address is replaced with a different translated 
memory address, in a manner that the vk:tim translated memory address is selected only from a portion of the set of 
translated memory addresses in the TLB that is not locked. 
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[0015] In another embodiment, one or more TLB locations are reserved from being locked. 
[0016] Another embodiment of the invention is a digital system that has a translation lookaside buffer (TLB). The 
TLB includes storage circuitry with a set of entry locations for holding translated values, wherein each of the set of 
entry locations includes a first field for a translated value and a second field for an associated qualifier value. There is 
5 a set of inputs for receiving a translation request, a set of outputs for providing a translated value selected from the 
set of entry locations; and control circuitry connected to the storage circuitry. The control circuitry is responsive to an 
operation command to lock or unlock selected ones of the set of entry locations that have a first qualifier value In the 
second field. 

[0017] In another embodiment, the control circuitry includes a shift register connected to the storage circuitry for 
10 selecting a next victim entry location and skip circuitry connected to the shift register and to the storage circuitry. The 
skip circuitry is operable to cause the shift register to skip over locked entry locations. 

[001 8] In another embodiment, the control drcuitry includes reservation drcurtry operable to reserve a portion of the 
entry locations from being locked. 

[001 9] Particular embodiments in accordance with the invention will now be described, by way of example only, and 
15 with reference to the accompanying drawings in which like reference signs are used to denote like parts and in whbh 
the Figures relate to the digital system of Figure 1 and in which: 

Figure 1 is a block diagram of a digital system that includes an embodiment of the present invention in a megacell 
core having multiple processor cores; 
20 Figure 2A and 2B together is a more detailed block diagram of the megacell core of Figure 1 ; 

Figure 3A Is a block diagram illustrating a shared translation lookaside buffer (TLB) and several associated mi- 
cro-TLBs (p-TLB) included in the megacell of Figure 2; 
Figure 3B is a flow chart illustrating a method of operating the TLB of Figure 3A; 

Figure 4 is a block diagram of a digital system similar to Figure 1 illustrating a cloud of tasks that are scheduled 
25 for execution on the various processors of the digital system; 

Figure 5 illustrates a TLB control format used to operate on the TLB and ^iTLBs of Figure 3A; 

Figure 6 illustrates operation of the TLB of Figure 3Afor selective flushing of an entry for a given task or resource; 

Figure 7 illustrates control circuitry for adaptive replacement of TLB entries in the TLB of Figure 3A; 

Figure 8A is a schematic illustrating an alternative embodiment of control circuitry that utilizes a shift register for 
30 adaptive replacement of TLB entries in the TLB of Figure 3A; 

Figure 8B is a schematic illustrating an altemative embodiment of the control circuitry of Figure 8A; and 

Figure 9 is a representation of a telecommunications device incorporating an embodiment of the present invention. 

[0020] Con^esponding numerals and symbols in the different figures and tables refer to corresponding parts unless 
35 otherwise indk^ated. 

Detailed Description of Embodiments of the Invention 

[0021] Although the invention finds partfcular applteation to Digital Signal Processors (DSPs), implemented, for ex- 
40 ample, in an Applteation Specific Integrated Circuit (ASIC), it also finds application to other fonro of processors. An 
ASIC may contain one or more megacells whteh each include custom designed functional circuits combined with pre- 
designed functional circuits provided by a design library. 

[0022] Figure 1 is a block diagram of a digital system that Includes an embodiment of the present invention in a 
megacell core 100 having multiple processor cores. In the interest of clarity, Figure 1 only shows those portions of 

^ megacell 100 that are relevant to an understanding of an embodiment of the present invention. Details of general 
construction for DSPs are well known, and may be found readily elsewhere. For example, U.S. Patent 5,072,41 8 issued 
to Frederick Boutaud, et al, describes a DSP in detail. U.S. Patent 5,329,471 issued to Gary Swoboda, et al, describes 
in detail how to test and emulate a DSP. Details of portions of megacell 1 GO relevant to an embodiment of the present 
invention are explained in sufficient detail herein below, so as to enable one of ordinary skill in the microprocessor art 

so to make and use the invention. 

[0023] Refemng again to Figure 1 , megacell 1 00 includes a control processor (M PU) 1 02 with a 32-bit core 1 03 and 
a digital signal processor (DSP) 1 04 with a DSP core 1 05 that share a block of memory 1 1 3 and a cache 1 1 4, that are 
refen-ed to as a level two (L2) memory subsystem 112. A ti^affic control block 110 receives transfer requests from a 
host processor connected to host interface 120b, requests from control processor 102, and transfer requests from a 

55 memory access node in DSP 1 04. The traffic control block interteaves these requests and presents them to the shared 
memory and cache. Shared peripherals 116 are also accessed via the traffic control block. A direct memory access 
controller 1 06 can transfer data between an external source such as off-chip memory 132 or on-chip memory 134 and 
the shared memory. Various application specifk: processors or hardware accelerators 108 can also be included within 
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the megacell as required for various applications and interact with the DSP and MPU via the traffic control block. 
[0024] External to the megacell, a level three (l_3) control block 1 30 is connected to receive memory requests from 
internal traffic control block 1 1 0 in response to explicit requests from the DSP or MPU, or from misses in shared cache 
1 1 4. Off chip external memory 1 32 and/or on-chip memory 1 34 is connected to system traffic controller 1 30; these are 

5 refered to as L3 memory subsystems. A frame buffer 1 36 and a display devtee 1 38 are connected to the system traffic 
controller to receive data for displaying graphical images. A host processor 1 20a interacts with the external resources 
through system traffic controller 130. A host interface connected to traffk; controller 1 30 allows access by host 120a 
to extemal memories and other devices connected to traffic controller 130. Thus, a host processor can be connected 
at level three or at level two in various embodiments. A set of private peripherals 140 are connected to the DSP, while 

10 another set of private peripherals 1 42 are connected to the MPU. 

[0025] Figure 2, comprised of Figure 2A Figure 2B together, is a more detailed block diagram of the megacell core 
of Rgure 1 . DSP 1 04 includes a configurable cache 203 that is configured as a local memory 200 and data cache 202, 
and a configurable cache 204 that is configured as instmction cache 206 and a RAM-set 208, whk:h are referred to as 
level one (LI) memory subsystems. The DSP is connected to the traffic controller via an L2 interface 210 that also 

15 includes a translation lookaside buffer (TLB) 212. A DMA circuit 214 is also included within the DSP Individual mbro 
TLBs (\iTLB) 216-218 are associated with the DMA circuit, data cache and instruction cache, respectively. 
[0026] Similarly, MPU 102 includes a configurable cache 223 that is configured as a local memory 220 and data 
cache 222, and a configurable cache 224 that is configured as instruction cache 226 and a RAM-set 228, again referred 
to as LI memory subsystems. The MPU is connected to traffic controller 1 1 0 via an L2 interface 230 that also includes 

20 a TLB 232. A DMA circuit 234 is also included within the MPU. Individual micro TLBs OiTLB) 236-238 are associated 
with the DMA circuit, data cache and instruction cache, respectively. 

[0027] L2 traff controller 1 1 0 includes a TLB 240 and one or more micro-TLB ftiTLB) 242 that are associated with 
system DMA block 1 06, host processor interface 1 20b for a host connected at level two, and other applteation specifte 
hardware accelerator blocks. Similarly, L3 traffic controller 1 30 includes a jiTLB controllably connected to TLB 240 that 
25 is associated with system host 120a at level three. This jiTLB is likewise controlled by one of the megacell 1 00 proc- 
essors. 

Memory Management Unit 

30 [0028] At the megacell traffic controller level, all addresses are physk»l. They have been translated from virtual to 
physical at the processor sub-system level by a memory management unit (MMU) associated with each core, such as 
DSP core 1 05 and MPU core 1 03. At the processor level, access permission, supplied through MMU page descriptors, 
is also checked, while at the megacell level protection between processors is enforced by others means, whfch will be 
described in more detail later. Each MMU includes a TLB and its associated ^.TLBs. 

35 [0029] The translation lookaside buffer (TLB) caches contain entries for virtual-to-physical address translation and 
page descriptor information such as access pemnission checking, cache policy for various levels, etc. If the TLB contains 
a translated entry for the virtual address, the access control logic determines whether the access Is permitted. If access 
is permitted, the MMU generates the appropriate physical address corresponding to the virtual address. If access is 
not pennitted, the MMU sends an abort signal via signal group 244 to the master CPU 1 02. The master CPU is identified 

40 by the value of the R-ID field. On a slave processor such as a hardware accelerator the R-ID is equal to the R-ID of 
the master CPU. 

[0030] Upon a TLB miss, i.e., the TLB does not contain an entry corresponding to the virtual address requested, an 
exception is generated that initiates a translation table walk software routine. The TLB miss software handler retrieves 
the translation and access permission information from a translation table in physical memory. Once retrieved, the 
45 page or section descriptor is stored into the TLB at a selected vfctim location. Victim location selection is done by 
software or with hardware support, as will be described later. 

Translation Table 

50 [0031] To provide maximum flexibility, the MMU is implemented as a software table walk, backed up by TLB caches 
both at the processor sub-system and megacell level. This allows easy addition of new page size support or new page 
descriptor information if required. A TLB miss initiates a TLB handler routine to load the missing reference into the 
TLB. At the Megacell 1 00 level, a TLB miss asserts a miss signal in signal group 244 and is routed via system interrupt 
router 250 to the processor having generated the missing reference or to the processor in charge of the global memory 

55 management, via interrupt signals 251 , 252. 

[0032] Translation tables and TLB cache contents must be kept consistent. A flush operation is provided for this 
reason and will be described in more detail later. 

[0033] An address reference is generally located within the p-TLB or main TLB of each processor sub-system; how- 
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ever, certain references, such as those used by system DMA 1 06 or host processor 120, for exannple, to access meg- 
acell memories can be distributed within L2 traffic controller 1 1 0 and cached Into L2 system shared TLB 240. Because 
system perfomnance is very sensitive to the TLB architecture and size, it is important to implement efficient TLB control 
commands to lock entries for critical tasks or unlock and flush those entries when a task is deleted without degrading 
5 the execution of other tasks. Therefore, each TLB and L2 cache entry holds a task-ID. Commands are supplied to flush 
locked or unlocked entries of a TLB/p-TLB con-esponding to a selected task, 

[0034] As part of the page descriptor information, the MMU provides cacheabiiity and bufferability attributes for all 
levels of memory. The MMU also provides a "Shared" bit for each entry to indicate that a page is shared among multiple 
processors (or tasks). This bit, as standalone or combined with the task-ID, allows specif cache and TLB operation 
10 on data shared between processors or/and tasks. The MMU may also provide additional infomnation, such as memory 
access pemnission and access priority as described later. 

[0035] All megacell memory accesses are protected by a TLB. As they all have different requirements in temn of 
acc^s frequencies and memory size, a shared TLB with individual p,TLB backup approach has been chosen to reduce 
the system cost at the megacell level. This shared TLB is programmable by each processor. The architecture provides 
15 enough flexibility to let the platfomi woric with either independent operating systems (OS) on each processors or a 
distributed OS with a unified memory management, for example. 

[0036] The present embodiment has a distributed operating system (OS) with several domains con-esponding to 
each processor but only a single table manager for all processors. Slave processors do not manage the tables. In a 
first embodiment slave processors R-ID are equal to the R-ID of the master CPU. In another embodiment, they could, 
20 however, have a different R-ID to control their TLB entries lock/unlock entries corresponding to some of their own tasks 
or flush all their entries, when putting themselves in sleep mode to free entries for the others processors. Having 
different R-ID provides a means to increase security in a concun^nt multi-processor environment, processor X can not 
access memory allocated to processor Y. 

[0037] In another embodiment with several independent OS(s), for example, there will be independent tables. These 
25 tables can be located in a memory space only viewed by the OS that they are associated with in order to provide 
protection from inadvertent modifk:ation by another OS. As they manage the virtual memory and task independently, 
the R-ID provides the necessary inter-processor security. R-lds are managed by a single master CPU. This CPU can 
make TLB operations on all TLB entries. TLB operation or memory accesses from slave processor are restricted by 
their own R-ID. The CPU master will have rights to flush out entries belonging to another processor in a different OS 
30 domain. 

[0038] The organization of the data structures supporting the menoory management descriptor is flexible since each 
TLB miss is resolved by a software TLB-miss handler. These data structures include the virtual-to-physical address 
translation and all additional descriptors to manage the memory hierarchy. A list of these descriptors and their function 
is described in Table 2. Table 1 includes a set of memory access pemnission attributes, as an example. In other em- 
35 bodiments, a processor may have other modes that enable access to memory without pemnission checks. 



Table 1 - 



Memory Access Pemnission 


Supervisor 


User 


No access 


No access 


Read only 


No access 


Read only 


Read only 


Read/Write 


No access 


Read/Write 


Read only 


Read/Write 


ReadA/Vrite 



50 



Table 2 - 



Memory Management Descriptors 


Execute Never 


provides access pemnission to protect data memory area from being executed. This information 
can be combined with the access permission described above or kept separate. 


Shared 


indk^tes that this page may be shared by multiple tasks across multiple processor. 
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Table 2 - (continued) 



Memory Management Descriptors 


Cacheabllity 


Various memory entities such as individual processor's cache and write buffer, and shared cache 
and write buffer are managed through the MMU descriptor. The options included in the present 
embodiment are as follows: Inner cacheable, Outer cacheable, Inner Write through/write back, 
Outer write through/write back, and Outer write allocate. The tenns Inner and outer refer to levels 
of caches that are be built in the system. The boundary between inner and outer is defined in 
specific embodiment, but inner will always include LI cache. In a system with 3 levels of caches, 
the inner con^espond to L1 and 12 cache and the outer correspond to L3 due to existing processor 
systems. In the present embodiment, inner is L1 and outer is L2 cache. 


Endianism 


determines on a page basis the endianness of the transfer. 


priority 


Indicates a priority level for the associated memory address region. Memory access can be 
prioritized based on this priority value. 



MMU/TLB Control Operation 

[0039] Figure 3A is a block diagram illustrating a shared translation lookaside buffer (TLB) 300 and several associated 
micro-TLBs (jiTLB) 310(0)-310(m) included in megacell 100 of Figure 2. On a jiTLB miss, the shared TLB is first 
searched. TLB controller 320 is alerted by asserting a jxTLB miss signal 324. In case of a hit on the shared TLB, the 
pJLS that missed is loaded with the entry content of the shared TLB 300. In the case of a miss in shared TLB 300, the 
shared TLB alerts TLB controIler 320 by asserting a TLB miss signal 326. Controller 320 then asserts an Interrupt 
request signal 328 to system interrupt controller 250. Interrupt controller 250 asserts an interrupt to the processor 
whose OS supervises the resource whteh caused the miss. A TLB entry register 330 associated with TLB controller 
320 is loaded by a software TLB handler in response to the interrupt. Once loaded, the contents of TLB entry register 
330 are transfen-ed to both shared TLB 300 and the requesting jiTLB at a selected vtetim location as indicated by arcs 
332 and 334. 

[0040] A separate TLB entry register 330 is only one possible implementation and is not necessarily required. The 
separate register TLB entry register is a memory mapped register that allows buffering of a complete TLB entry (more 
than 32 bits). A TLB value is not written directly in the TLB cache but is written to the TLB entry register first. Because 
of the size of an entry, several writes are required to load the TLB entry register. Loading of a TLB cache entry is then 
done In a single operation "Write TLB entry". Advantageously, other uTLBs associated with other modules can continue 
to access the shared TLB while the TLB entry register is being loaded, until a second miss occurs. Advantageously, 
by controlling access to the TLB via the TLB entry register, CPUs have no direct access to TLB cache intemal structure 
and thus the risk of partial modifications inconsistent with the MMU tables is avoided. 
[0041 ] The sequence of operations to update a TLB cache entry after a miss is: 

1 - the software TLB handler writes to the TLB entry register, 

2- the software TLB handler sends a command to write the TLB entry, which transfers a value from TLB entry 
register to a preselected victim TLB cache entry; and 

3- control circuitry checks and preselects a next victim TLB entry, in preparation for the next miss. In this embod- 
iment, this step is generally perfomned in background prior to the occurence of a miss. 

[0042] Advantageously, TLB cache entries can be preemptively updated under OS software control to prevent TLB 
miss by pre-loading a new entry, using the following sequence of operation: 

1 - control circuitry checks and selects a TLB entry, referred to as a victim TLB cache entry. 

2- the software TLB handler writes to the TLB entry register, and 

3- the software TLB handler sends a command to write the TLB entry, which transfers a value from TLB entry 
register to the selected victim TLB cache entry. 

[0043] The priority on the shared TLB is managed in the same way as priority on a memory access. One or more 
resources can be using the shared TLB. One or more resources can program the shared TLB. The replacement algo- 
rithm for selecting the next vk^im location in the shared TLB is under hardware control. A victim pointer register 322 
is maintained for each TLB and ^iTLB to provide a vkjtim separate pointer for each. A typfcal embodiment will use a 
round robin scheme. Different TLBs within a single megacell can use different replacement schemes. However, in an 
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embodiment in which the system has a master CPU with a distributed OS, this master CPU couid also bypass the 
hardware replacement algorithm by selecting a victim entry, reading and then writing directly to the Shared TLB, for 
example. 

[0044] In this embodiment, each shared TLB has 256 entries. Each jiTLB is generally much smaller, i.e., has fewer 
5 entries, than the shared TLB. In various embodiments, each shared TLB has 64-256 or more entries while jiTLBs 
generally have 4-16 entries. The penalty for a miss in a jiTLB is small since a correct entry is generally available from 
the shared TLB. Therefore, the present embodiment does not provide direct control of the victim pointers of the various 
jiTLBs; however, direct control of the victim pointer of shared TLBs, such as 212, 232, and 240, is provided. 
[0045] Each entry in a TLB has a resource Identifier 301 along with task-ID 302. Resource-IDs and task IDs are not 
10 extension fields of the virtual address (VA) but simply address qualifiers. Resource IDs are provided by a resource-ID 
register associated with each resource; such as R-ID register 342a associated with resource 340 and R-ID register 
342n associated with resource 350. Resource 340 is representative of various DMA engines, coprocessor, etc within 
megacell 1 00 and/or an extemal host connected to megacell 1 00. Resource 350 is representative of various processors 
within megacell 100. Each resource 340, 350 typically has its own associated R-ID register; however, various embod- 
15 iments may choose to provide resource ID registers for only a selected portion of the resources. A task ID is provided 
by a task-ID register, such as task-ID register 344a associated with resource 340 and task-ID register 344n associated 
with resource 350. A task register associated with a non-processor resource, such as DMA, a coprocessor, etc, is 
loaded with a task value to indicate the task that it is supporting. 

[0046] In another embodiment, only processor resources 340, 350 that execute program modules have an associated 
20 programmable task-ID register. In this case, a system wide default value may be provided for access requests initiated 
by non-processor resources such as DMA. The default value may be provided by a programmable register or hardwired 
bus keepers, for example. 

[0047] Advantageously, with the task-ID, all entries in a TLB belonging to a specif to task can be identified. They can, 
for instance, be invalidated altogether through a single operation without affecting the other tasks. Advantageously, 
25 the resource ID permits discrimination of different tasks being executed on different resources when they have the 
same task number. Task-ID number on the different processors might not be related; therefore, task related operations 
must be, in some cases, qualified by a resource-ID. 

[0048] In another embodiment, the R-ID and TaskJD registers are not necessarily part of the resource core and can 
be located elsewhere in the system, such as a memory mapped register for example, and associated to a resource 

30 bus. The only constraint is that a taskJD register related to a CPU must be under the associated OS control and 
updated during context switch. R-ID must be set during the system initialization. In some embodiments at system 
initialization, all R-ID and Task-ID registers distributed across the system are set to zero, which is a default value that 
causes the field to be ignored. In other embodiments, a different default value may be used. In other embodiments, 
R-ID "registers" provide hardwired values. 

35 [0049] Refening still to Figure 3A, each TLB entry includes a virtual address field 305 and a conresponding physical 
address field 308 and address attributes 309. Various address attributes are described in Table 1 and Table 2. Address 
attributes define conditions or states that apply to an entire section or page of the address space that is represented 
by a given TLB entry. An S/P field 306 specifies a page size such as 64kB and 4kB for example. Naturally, the page 
size detemrtines how many most significant (ms) address bits are included in a check for an entry. 

40 [0050] Each TLB entry also includes "shared" bit 303 and a lock bit 304. All entries marked as shared can be flushed 
in one cycle globally. A V field 307 indicates if an associated TLB cache entry is valid. V field 307 includes several V- 
bits that are respectively associated with R-ID field 301 to indicate if a valid R-ID entry is present, task-ID field 302 to 
indtoate if a valid task-ID entry is present, and virtual address field 305 to indicate if a valid address entry is present. 
These valid bits enable the compare logic with their associated field. 

45 [0051] As mentioned eariier, the resource ID field and task ID field in each entry of the TLB/jiTLB can be used to 
improve security. During program task execution, each transaction request is checked by the miss control circuitry of 
the TLB/fiTLB to determine if the entry is allowed for a specific resource or for all resources and for a specific task or 
for all tasks. For example, if a request is received and a valid entry is present for the proffered virtual address but a 
task ID or R-ID whtoh accompany the request does not match the corresponding valid task ID and R-ID fields of the 

50 entry, then a miss is declared. If the task ID and/or R-ID fields of the entry are marked as invalid, then they are ignored. 
[0052] Figure 3B is a flow chart illustrating a method of operating the TLB of Figure 3A. As discussed above, the 
TLB is filled during the normal course of action by a set of translated address entries along with qualifier fields that are 
incorporated with each entry. As will be described in more detail below, operations can now be pertonned on the TLB 
that are qualified by the various qualifier fields. 

55 [0053] In step 360, an operation command is received by the control circuitry of the TLB. This command is sent by 
the MMU manager during the course of operation. Commands are sent as needed to flush (invalidate), lock or unlock 
selected entries within the TLB. These operations will be described in detail later. 

[0054] Step 362 accesses a first entry in the TLB and reads the qualifier field specified by the operation command. 
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This can be task ID field 302, resource ID field 301 . shared indicator 303, or combinations of these. Operation com- 
mands can also specify a selected virtual address entry. 

[0055] Step 364 compares the qualifier specified by the operation command with the qualifier field read from the TLB 
entry. If they match, then the operation is perfomied on that entry in step 366. If they do not match, then the next entry 
5 is accessed in step 368 and compare step 364 Is repeated for the next entry. 

[0056] Step 366 performs the operation specified in the operation command on each entry whose qualifier field(s) 
match the operation command. In this embodiment, the operation can invalidate an entry by resetting valid bit field 
307, and lock or unlock an entry by appropriate setting of lock bit 304. 

[0057] Step 368 access each next TLB entry until all entries have been accessed. In this embodiment, all ^TLBs 
io associated with a shared TLB are also accessed as part of the same operation command. 

[0058] Other embodiments may provide additional or different operations that are qualified by the qualifier fields of 
the present embodiment or by additional or other types of qualifier fields. For example, resource type, power consump- 
tion, processor speed, instruction set family, and the like may be incorporated in the TLB and used to qualify operations 
on the TLB. 

15 [0059] Figure 4 is a block diagram of a digital system similar to that of Figure 1 illustrating cloud of tasks that are 
scheduled for execution on the various processors of the digital system. Typically, each software task includes a task 
priority value that is commonly used by an operating system to schedule an order of execution for a set of pending 
tasks 1440. 

[0060] In this illustration, a circle such as 1442 represents a task, with a task name "c" and a task priority of 12, for 
20 example. Likewise, task 1443 has a task name "r" and a priority of 15, where a lower number indteates a higher priority. 
If the set of tasks 1440 are assigned to three processors, then an operating system on each processor fomns a ready 
to execute queue, such as ready queue 1 446 in whteh task "c" is scheduled for first execution, then task "a" and finally 
task "b" according to priority values of 12, 15, and 50 respectively. The Task ID register in each processor is loaded 
when a task is invoked. 

25 [0061] Table 3 illustrates several portions of instruction code sequences in which a task is spawned. From line 1 to 
line 5, task "c" is active and spawns a new task, "audio" on line 5. The kemel is then invoked to instantiate the new 
task and create the associated TCB. An eight bit (numbers of bits can be more or less) task-ID field is memorised in 
the TCB at line 1 1 . During the context switch (reschedule in tine 1 3) before launching the "audio" task, the kernel loads 
task-ID register 1412 with the task-ID value held in the TCB (Table 4) or in another table. At line 14, the new task is 

30 now active. 
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Table 3 - Setting Task ID at the Start of a Task 



1 // (Task c code execution) 

2 Instruction 1 

3 r 

4 instruction n 

5 Ta9kspawnraudio^200,0,5000,(FUNCPTR)audio// (Task ccode execution: 
instruction n+2) 

6 //(Kernel code execution) 

7 

8 TaskCreateO 

9 //(taskcreate code execution) 

10 

11 SetTaskAttributelD(TID) 

12 

13 // Kernel reschedvde code execution 

14 //(Task Audio code execution) 

15 * Instruction 1 

16 



[0062] Table 4 is an example task control block that is used to define a task-l D. Typbally, the OS uses a 32'bit task-ID 
that is In fact an address that enables the OS to locate task information (TCB). At line 4, an execution priority value is 
defined that is used by the operating system to schedule execution of the task. At line 5, a task-ID value is defined that 
Is used to set the task ID register when the task is instantiated. 
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Table 4 - Setting Task ID Using a TCB 





1 


TCB (task control block) 


5 


2 


Typedef struct TCB 




3 


{ 


10 


4 


UINT OS-priority 


5 


UINT Task_ID 




6 


— 


15 


7 


#ifCPUJ'AMILY==xx 




8 


EXC_INFO exdnfo;. 




9 


REG_SET regs; 


20 


10 






11 


#endif 


25 


12 


} 



[0063] In other embodiments, other means than a TCB may be provided for storing the task ID. 
[0064] Referring again to Figure 3A, task-ID field 302 can be set in response to information provided at line 5 of the 
TCB Illustrated in Table 4. This infomriation can be used directly by the MMU manager when loading a new entry in 
30 TLBs. This information could also be part of the page table descriptor in the MMU page table and loaded as part of 
the MMU software table walk. 

[0065] In the present embodiment, task-ID infomiation is not maintained in page tables but is inserted by the TLB 
miss handler at the time of a TLB fault by using the taskJD value of the transaction request that caused the TLB fault. 
Other embodiments may use other means for setting the task-ID field in the TLB entry, such as by storing this information 

35 in a separate table or In the MMU page tables, for example. In the present embodiment the Valid brt associated with 
the task-ID field is loaded through the MMU table walk and is part of the MMU tables. Thus, when the TLB miss handler 
accesses a page table in response to a TLB miss, ft queries the task-ID valid bit field of the MMU page table; if this bit 
field is asserted, then the TLB miss handler asserts the task-ID valid bit in the TLB entry and loads the task-ID value 
from the task-ID register of the requester that caused the TLB miss into task ID field 302. If the task-ID valid bit field 

40 of the MMU page table is not asserted, then the TLB miss handler deasserts the task-ID valid bit in the TLB entry and 
the task-ID value from the task-ID register of the requester that caused the TLB miss is ignored. 
[0066] In the present embodiment, the shared bit field 303 is loaded through the MMU table walk and is part of the 
MMU tables. Typically, shared pages are defined by the OS in response to semaphore commands, for example. 
[0067] In another embodiment, shared bit infomriation is not maintained in page tables but is inserted by the TLB- 

45 miss handler at the time of a TLB fault by accessing the TCB directly based on the task ID of the request that caused 
the fault. The TCB is located by the TLB-miss handler via a look-up table keyed to thetaskID value. Other embodiments 
may use other means for setting the shared bit in the TLB entry by storing this information in a separate table, for 
example. 

[0068] R-ID field 301 is set by using the R-ID of the request that caused the fault. A Master CPU could also load 
50 value in this field during the programming of a TLB entry by taking this infomriation from the MMU tables or separate 
tables, for example. 

[0069] Figure 5 illustrates a TLB control word fonmat used to operate on the TLB and fiTLBs of Figure 3A in response 
to control operations as defined in Table 5. TLB control word fomiat 400 includes a task-ID field 402, resource-ID field 
404 and virtual address field 406. Note that the virtual address field refers to a page address, therefore Isb address 
55 bits that refer within a page are not needed. In some embodiments, certain of the processors might not be allowed to 
invalidate entries other than their own. 

[0070] As described previously, during execution of a program, the R-ID and Task- ID field comes from a register 
associated with a requester during each memory system access request. In a system embodiment with multi-proces- 
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sors wrth multiple Independent Operating Systems (OS), the R-tD is static and Indicates which of the resources is 
accessing a given location (address). The Task-ID indicates which of the tasks (or processes) of this resource is doing 
the access. The task ID is dynamb and changes on each context switch. For these systems, restricting operations on 
a system TLB to the associated resource is important to optimize the main system TLB usage. Each OS controls the 
TLB entries it uses. 

[0071] However, another system embodiment might be controlled by middleware that supports a unified task and 
memory management. For those, the notion of R-ID might disappear and be treated as part of the task__ID. Restriction 
of TLB command based on R-ID would not be necessary in those systenns and the field R-ID could be re-used to extend 
the task-ID field. In that case, TLB control fonnat 410 may be used in which the RJd field is not needed. Recall that 
the R-ID of the requestor is provided with each transaction request, therefore control operations specified using fomnat 
410 can be confined to entries associated with the requestor. 

[0072] A processor can initiate various control operations on a TLB by writing a control word confomningto appropriate 
fonnat to a specific memory mapped address associated with TLB controller 320. This control word can specify a target 
virtual address entry and an associated task ID or an associated resource ID. Depending on the operation, unneeded 
fields are ignored. For example, the operation "invalidate all entries related to an R-ID" will only use the R-ID field 404. 
The fomiat and type of operation can be distinguished by using different memory mapped addresses, for example. 
Each address con^esponds to a different TLB operation. Another embodiment would be to use a different processor 
instruction opcode for each of the TLB operation that would drive the appropriate control signal connected to TLB 
controller 2232. A state machine in TLB controller 320 then executes the requested control operation. These TLB 
control operations are listed in Table 5. These operations are described in more detail below. For many of the operations, 
certain processors in an embodiment wilt be restricted to only affecting their own entries. This restriction is enforced 
by using the resource-ID signals 2106 provided with each write to TLB controller 320 as part of each memory access 
request. 

Table 5 - TLB Control Operation 

Invalidate entry with VA 

Invalidate all entries related to a Task-ID 

Invalidate all entries related to a R-ID 

Invalidate all shared entry , 

Invalidate all entries of a task except shared 

Invalidate All entries 

Lock/UnLock entry 

Lock/Unlock all entries related to a task-ID/R-ID 
Read TLB entry 
Write TLB entry 

Check and select victim TLB entry 
Set victim TLB entry 



[0073] In another embodiment, the control operations can be Invoked by executing an instruction that invokes a 
hardware or software trap response. As part of this trap response, a sequence of instructions can be executed or a 
control word can be written to a selected address, for example. In another embodiment, one of the processors may 
include instruction decoding and an intemal state machine(s) to perfonm a TLB or Cache control operation in response 
to executing certain Instructions which may include parEimeters to specify the requested operation, for example. 
[0074] For an "invalidate entry operation, a Virtual page address (VA) is provkied in VA field 406 of the control word 
and the other fields of the control word are ignored. This generates an entry invalidate operation on the corresponding 
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virtual address entry. Note that all processors of a given megacell embodiment might not be allowed to invalidate entries 
others than their own . In that case, the R-ID value from the R-ID register of the requestor is used to qualify the operation. 
[0075] For an "invalidate all entries related to a task" operation, all entries corresponding to the provided task identrf ler 
are invalidated. This allows a master-processor to free space from the shared TLB by invalidating all entries of a task 
5 belonging to another processor. In this case, the control word provides a task-ID value and an RJD value. Processors 
other than the master-processor can free space from the shared TLB by invalidating alt entries of one of its own tasks. 
This operation invalidates all the entries corresponding to the provided task and resource identifier or to a task of the 
resource requesting the operation. The R-ID value from the R-ID register of the requestor is used to qualify the oper- 
ation. 

10 [0076] For an "invalidate all entry related to a Resource" operation, all entries corresponding to RID field 404 of the 
control word are invalidated. Note that all processors of a given megacell emt>odlment might not be allowed to invalidate 
entries other than their own. This provides, however, the capability to a master processor to free space from the shared 
TLB by invalidating all entries of another processor. The R-ID value from the R-ID register of the requestor is used to 
qualify the operation. 

15 [0077] For an "invalidate all shared entries" operation, all entries in the TLB marked as shared for the requester are 
invalidated. The R-ID register value limits the effect of this operation, as discussed above. 

[0078] For an "invalidate all entries of a task except shared entries" operation, all entries in the TLB for a task specified 
in the control word not marked as shared for the requester are invalidated. The R-ID value from the R-ID register of 
the requestor limits the effect of this operation, as discussed above. 
20 [0079] For an "invalidate all entries" operation, all entries in the TLB matching the R-ID of the requester are invali- 
dated. Forthe master CPU, the operation invalidate all entry regardless of the R-ID. If all of the R-ID registers distributed 
in the system have the same value, then this operation invalidates all entries. 

[0080] For a "lock/unlock entry" operation, a control word is written providing the VA which needs to be locked/ 
unlocked. This operation sets or resets lock field 304 in the selected entry. Restriction on R-ID applies as above. 
25 [0081] For a "lock/unlock all entry related to a task" operation, a control word is written providing the task identifier 
whtoh needs to be locked/unlocked. Restriction on R-ID applies as above. 

[0082] In the case in whch an independent OS is running on each processor, each OS can initiate the above oper- 
ations. In that case, these operations must be restricted to entries with a resource identifier (R-ld) belonging to the 
requester. 

30 [0083] In the case of a single master OS, task and memory management can be viewed as unified, removing the 
need for an R-ld. The R-ID can be an extension of the task-ID. In an embodiment, in which the R-ID is hard-coded, 
the field R-ID in the TLB simply needs to be disabled (associated Valid bit is cleared) via a configuration control register. 
Disabling the R-ID is equivalent to having a single R-ID for all the system or for part of the system. 
[0084] As mentioned above, a global control bit can be used in an embodiment to detenu ine if all the above functions 

35 must be limited to the entry corresponding to the resource ID requesting the operation. 

[0085] Although it is preferable to have the same page size for memory management on all processors, rt is not 
mandatory. In a shared system, the TLB supports all page sizes of the system, in response to S/P field 306. Therefore, 
in a different embodiment, a TLB may support a different set of page sizes. 

[0086] Table 5 also lists some additional operations that are provided which allow a software TLB handler to access 
40 the shared system TLB: Read TLB entry. Write TLB entry. Check and select vk^im TLB entry, and Set victim TLB entry. 
These are described in more detail below. 

[0087] For a "Read TLB entry" operation, an entry In the TLB pointed to by the victim pointer is transferred into TLB 
entry register 330. The TLB entry register can then be read and analyzed by the software TLB handler. Again this 
operation might be restricted to the master CPU for security. 
45 [0088] For a "write TLB entry" operation, the contents of the TLB entry register is transfen^ed to a selected victim 
entry of the TLB. 

[0089] The "check and select victim TLB entry" operation has multiple functions. Its first purpose is to determine an 
index value for the replacement of an entry. However, it can also be used to find out if an entry is already in the TLB. 
The RJD & Task_ID & VA fields of a corresponding entry are checked for a match against a proffered virtual address 
50 entry. If there is no match, then the vfctim pointer is positioned according to the chosen replacement algorithm. This 
replacement can be random , cyclic, etc. The second usage is to verify if a given page is present in the TLB. ff a matching 
entry is found, the vic^m entry points to this matching entry, and a flag bit in the status register is set to indicate this 
condition. 

[0090] The "Set victim TLB entry" operation allows the software TLB handler to select a partcular entry as the next 
55 victim. This is useful to support certain lock mechanisms software replacement algorithms. 

[0091] As indtoated eariier, each control operation is perfomied by a state machine within TLB control circuitry 320 
in response to writing to a selected memory mapped address. For example, for the operation "invalidate all entries 
related to a task", all entries with a matching task-id TAG are invalidated in response to a single command, including 
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the shared TLB and the associated jiTLBs, In the present embodiment in which the TLB is a fully associative memory, 
the operation can be done in one cycle or as a loop as most appropriate. 

[0092] As mentioned above, control operation affect the shared TLB and the associated jiTLBs for the various op- 
erations based on task-ID, resource- ID and shared bits. In an embodiment in which both uTLBs and TLB are fully 
associative, the flush and/or Lock/unlock can be done by the same command in the sanrte cycle. But if the uTLB is fully 
associative and TLB is set associative, for example, a single command is still used, but the operation into the set 
associative TLB will be executed entry by entry by a HW loop. This will take longer time. If both the uTLB and TLB are 
fully associative there will typk^alty be a single control block. If the uTLB is fully associative and TLB set associative, 
there may be separate control blocks 320, but the same command effects all of the control blocks. Alternatively, an 
embodiment may require sending copies of the operation command separately to each of the separate control blocks. 
[0093] Figure 6 is a simplified block diagram of the TLB of Figure 3A and will now be referred to explain selective 
invalidation of an entry for a given task or resource, as listed in Table 5. Processor 2100(m) is representative of one 
or more requestors that access TLB 21 30. A physical address bus 21 04(m), resource ID signals 21 06(m), and task ID 
signals 21 08(n) are provided by each processor 21 00(n) for each TLB request. Traffic controller 2110 provides request 
priority selection and sends the highest priority request to TLB 2130 using physical address bus 2104, resource ID 
signals 2106, and task ID signals 21 08 to completely identify each request. 

[0094] A task-ID field 302 and/or a resource ID field 301 stored as independent fiekis In the TLB TAG anay is used 
to selectively invalidate (flush) all entries of a given task or a given resource (requester). A state machine within control 
circuitry 2132 receives a directive from a processor to perfonn an invalidation operation, as described above. The 
operation directive specifies which task-ID is to be flushed using format 400 or 410 (see Figure 5). 
[0095] For operations which use task ID field 402 in the control word, state machine 2132 accesses each entry in 
TLB 2130, examines the task-ID field, and if there is a match that entry is flushed by marking the valid bits in its valid 
field 307 as not valid. Thus, a single operation is provided to flush all entries of a given task located in a TLB. As 
discussed above, in this embodiment, the TLB cache is made of several levels of set associative TLB and jiTLB, and 
all levels are flushed simultaneously in response to a single operation directive command by accessing each entry 
sequentially in a hardware controlled loop. 

[0096] For operations which use both task ID field 402 and R-ID field 404 in the control word, state machine 2132 
accesses each entry in TLB 21 30, examines the task-ID field and the resource ID field, and if there is a match in both 
the task ID and R-ID fields that entry is flushed by maricing all valid bits in its valid field 307 as not valid. Advantageously, 
this allows discrimination of entries belonging to tasks from different resources that have the same task ID number. 
When the R-ID valid bit is set, an entry is not flushed if its R-ID field 301 does not match the value provided on R-ID 
signals 2106. This operation only invalidates entries with a valid task-ID. 

[0097] In a similar manner, the selective invalidatton operation "Invalidate all entries related to a R-ID" is perfomned 
by examining the R-ID 301 field of each entry and If there is a match in the R-ID field that entry is flushed by marking 
its valid field 307 as not valid. This operation only invalidates entries with a valid R-ID. 

[0098] Likewise, the selective invalidation operation "Invalidate all shared entries" is perfomned by examining the 
share field 303 of each entry and if there is a match in the shared field that entry is flushed by marking its valid field 
307 as not valid. All entries mariced as shared can be flushed in one cycle. 

[0099] In the present embodiment, when shared entries are flushed, state machine 2132 ignores the task ID field 
since shared page entries may be used by different tasks having different task IDs. In an altemative embodiment, 
shared entry flushing could also be qualified by the task ID field. Altematively, shared entry flushing could also be 
qualified by the task ID field, but only if the task ID valid bit in valid field 307 is asserted indicating a valid task ID ^lue 
is in field 302. 

[0100] Figure 7 is a simplified block diagram of the TLB of Figure 3A and will now be refen'ed to explain selective 
lock/unlocking of an entry for a given task or resource, as listed in Table 5. Advantageously, in this multi-processor 
system with system shared TLB, an innovative scheme of adaptive replacement is provided for controlling the TLB on 
a task basis, as discussed above. In order to support such a function in the most optimized way, an adaptive replacement 
algorithm taking into account locked entries and empty entries is provided. TLB full signal 2240 is asserted when one 
or more valid bits in field 307 is asserted for each TLB entry location. TLB miss signal 2242 is asserted to indk^ate a 
miss occurred in response to a transaction request from processor 21 00(m), which invokes a TLB handler as described 
eariier. 

[0101] When the TLB is full with no locked entries, pseudo-random replacement based on a simple counter (Victim 
CNT) 2234 is used to select the victim entry. Another embodiment would be to keep a pseudo random replacement 
and to check the lock bit on a miss. If it is locked, signal 2244 is asserted and the vk;tim counter is incremented further 
until a non-locked entry is found. This is done automatk:ally by the control circuitry connected to victim counter 2234 
so that response time of the TLB handler routine is not impacted. 

[0102] When the TLB is not full, the victim counter is incremented until an empty entry is found. This is done auto- 
matically by the control circuitry connected to vk:tim counter 2234 so that response time of the TLB handler routine is 
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not impacted. 

[01 03] After a flush entry operation is p»rf omned, the victim "counter" is updated with the location value of the flushed 
entry and stays unchanged until a new line is loaded in order to avoid unnecessary searching. 
[0104] An altemative implementation provides the capability to do the victim entry search instantaneously by pro- 
5 viding in an extemal logic the lock and valid bit or by using a CAM, for example. In another altemative embodiment, a 
shift regster and associated circuitry is used to point to the next location in the TLB that is either not valid or valid and 
not locked. 

[0105] Figure 8A is a schennatic illustrating an altemative embodiment of victim selection circuitry 2234 that utilizes 
a shift register for adaptive replacement of TLB entries in the TLB of Figure 3A. Shifter 850 is used to point to the next 
10 location in the TLB that is either not valid or valid AND not locked. Assuming the TLB has n entries, shifter 850 has 
only n-1 positions. Position zero of the TLB is reserved as a victim location when all entries are valid and all entries 1 
to n-1 are locked. 

[01 06] Lock bits 804 are equivalent to lock field 304 of Figure 3A, except they are implemented as individual storage 
bits rather than as part of a TLB memory anBy so that they can be monitored by resewation circuitry comprising AND 

15 gate 854 to fomi a shift control signal 856 that is asserted when all of the monitored lock bits are asserted. Also, the 
individual lock bits can be set by individual gates 852[x] in response to a lock operation. When signal 856 is asserted, 
only entry location S[0] is available. If the shifter indbates S[0] and a lock request occurs, an error is signaled to the 
CPU by gate 852[0]because the position S[0] is reserved as unlockable. The TLB miss handler in the OS can then 
decide to remove one of the already locked entries in order to lock this new one. 

20 [0107] Valid bits 807 are equivalent to the VA valid bit of valid field 307 of Figure 3A, except they are implemented 
as individual storage bits rather than as part of a TLB memory array so that they can be individually monitored by AND 
gate 858 to fomn an "all valid" control signal 860 that is asserted when all of the monitored valid bits are asserted. 
[0108] During operation, shifter 850 has one bit set to one, such as bit S[1 ], and all the other bits set to zero selecting 
the S[1] entry as a candidate vtetim entry. If either the S[1] entry is not valid V[1] = 0 or all entries are valid and the S 

25 [1 ] entry is not locked (all_Valid AND V[1 ] = 1 AND L[1 ] =0), as detennined by skip circuit 870[1 ], the shifter stops with 
a stop_shifter signal 874 asserted. Signal 874 is provided by OR gate 872 whteh receives outputs from each of a set 
of skip circuits 870[x] connected to each of valid bits 807 V[1]-V[n-1]. In this case, entry S[1I is selected as the next 
victim entry. 

[0109] Otherwise, the shifter continues its search. S[2] is set and ail the other bits of the shifter are zero. The ssime 
30 condition is checked for S[2] position and if true, the shifter stops on the victim entry 2. Othenwise, the shifter continues 
until an unlocked entry is found and selected. By doing one check per clock cycle, the shifter stops on the first sequential 
position it finds available for replacement. 

[01 1 0] The shifter starts (enable_clk = 1 ) a search after each new load entry (TLB-miss). Advantageously, the latency 
to find the victim entry is therefore hidden due to TLB-miss occurrence and the time required to handle the TLB-miss, 
35 whteh may be many CPU cycles. 

[0111] Still referring to Figure 8A. an embodiment is illustrated that has one reserved unlockable entry. Other em- 
bodiments may have several unlockable entries reserved (m). In that case, once n-m are locked (all-locked = true), 
the victim entry selection iterates cyclteally between 0 and m-1 . 

[0112] The position S[0] reserved in case of alljocked case is not realty used, meaning that the TLB size is realty 
40 n-1 . An altemative embodiment would be to have a shifter with n location to avoid losing any entry. Lock request on 
unlockable entry zero (S[0]) would raise a flag and position the vk:tim pointer on the first unlocked entry. The CPU can 
then read the content of the vbtim location and decide to use this entry to lock the desire entry. This would add potential 
latency on lock operation but remove the loss of TLB entries. 

[01 13] Another altemative implementation to avoid loss of a TLB entry is to execute the all-taskid-unlocked operation 
45 as a loop of n. In that case, a "locked^^unter" can be used to detect if more than n-m entries are locked and to thereby 
keep m entries unlocked. Every new lock entry request increments the locked-counter. The locked-counter is decre- 
mented through the ail-task-id-unlocked loop. 

[0114] Figure 8B is a schematic illustrating an altemative embodiment of the control circuitry of Figure 8A using 
reservation circuitry comprising a locked-counter 880 and comparator 882. Signal lock-auth(orized) 884 remains as- 
50 serted as long as the count value is less than n-m. In this implementation unclock operation takes n cycles, but no TLB 
entry is lost. If a new lock request occurs once lock_auth is cleared, the new entry is not locked, but gate 886 asserts 
an error signal that can set a flag or an interrupt error can be returned to the CPU. The OS lock-error handler can then 
decide if another entry can be unlocked to let the new one be locked. 

[0115] In this embodiment, lock bits L[n] can be part of the TLB cache memory instead of discrete logic because ail 
55 lock and unlock operation are done one entry at a time (selected by the shifter). Similarty, if an a up-down counter is 
provided to generate the alLvalid signal, then the V[n] bits can also be part of the TLB memory. The skip togic can be 
reduced to a single set on the output of the memory and the OR 872 is removed. 

[0116] Referring again to Figure 7, the function "Lock/Unlock all entries of a given task" listed in Table 5 is imple- 
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mented by the comparison of the task-id field 302 of each entry in the TLB. If this field matches a task-id value 402 
supplied in the control word (see Figure 5), the entry is locked by setting the associated lock bit 304 or unlocked by 
clearing the associated lock bit 304 of each matching entry depending on the requested operation. In an embodiment 
of a TLB implemented with a memory array, the function is done through a hardware loop using a finite state machine 

5 located in control circuitry 2232, for example. In an alternative embodiment of a TLB implemented with a content 
addressable memory (CAM), all entries with the same task-ID can be locked or unlocked in one cyde. 
[0117] As discussed above, lock/unlock request are restricted as mentioned above by the R-ID provided on signals 
21 06. When R-ID field 301 does not match the value provided on R-ID signals 21 06 the entry s not locked/unlocked . 
[0118] Thus, Lock/unlock operation on the TLB based on task-ID and optionally qualified by R-ID is provided. A 

10 pseudo-random replacement algorithm for the TLB is changed into a sequential replacement algorithm upon detecting 
an empty entry location or a locked vbtim entry location. 

Digital System Embodiment 

15 [0119] Figure 9 illustrates an exemplary implementation of an example of such an integrated circuit in a mobile 
telecommunk^tions device, such as a mobile personal digital assistant (PDA) 10 with display 14 and integrated input 
sensors 12a, 12b located in the periphery of display 14. As shown in Figure 9, digital system 10 includes a megacell 
1 00 according to Rgure 1 that is connected to the input sensors 1 2a,b via an adapter (not shown), as an MRU private 
peripheral 142. A stylus or finger can be used to input infomriation to the PDA via input sensors 12a,b. Display 14 is 

20 connected to megacell 100 via local frame buffer similar to frame buffer 136. Display 14 provides graphical and video 
output in overlapping windows, such as MPEG video window 14a, shared text document window 14b and three di- 
mensional game window 14c, for example. 

[0120] Radio frequency (RF) circuitry (not shown) is connected to an aerial 18 and is driven by megacell 100 as a 
DSP private peripheral 1 40 and provides a wireless network link. Connector 20 is connected to a cable adaptor-modem 

25 (not shown) and thence to megacell 1 00 as a DSP private peripheral 1 40 provides a wired networic link for use during 
stationary usage in an office environment, for example. A short distance wireless link 23 is also "connected" to ear 
piece 22 and is driven by a low power transmitter (not shown) connected to megacell 100 as a DSP private peripheral 
140. Microphone 24 is similarly connected to megacell 100 such that two-way audio infonmation can be exchanged 
with other users on the wireless or wired networi( using mk^rophone 24 and wireless ear piece 22. 

30 [0121] Megacell 100 provides all encoding and decoding for audio and video/graphical infomnation being sent and 
received via the wireless networic link and/or the wire-based network link. 

[0122] It is contemplated, of course, that many other types of communfcations systems and computer systems may 
also benefit from the present invention, partfcularty those relying on battery power. Examples of such other computer 
systems include portable computers, smart phones, web phones, and the like. As power dissipation and processing 
35 performance is also of concern in desktop and line-powered computer systems and micro-controller applications, par- 
tbularly from a reliability standpoint, it is also contemplated that the present invention may also provide benefits to 
such line-powered systems. 

[0123] Fabrication of the digital systems disclosed herein involves multiple steps of implanting various amounts of 
impurities Into a semrconductor substrate and diffusing the impurities to selected depths within the substrate to form 
40 transistor devbes. Masks are fomned to control the placement of the impurities. Multiple layers of conductive material 
and insulative material are deposited and etched to interconnect the various devices. These steps are performed in a 
clean room environment. 

[01 24] A significant portion of the cost of producing the data processing devtee involves testing. While in wafer form, 
individual devk^es are biased to an operational state and probe tested for basic operational functionality. The wafer is 
45 then separated into individual dice whk:h may be sold as bare die or packaged. After packaging, finished parts are 
biased into an operational state and tested for operational functionality. 

[0125] The digital systems disclosed herein contain hardware extensions for advanced debugging features. These 
assist in the development of an application system. Since these capabilities are part of the megacell itself, they are 
available utilizing only a JTAG interface with extended operating mode extensions. They provide simple, inexpensive, 
50 and speed independent access to the core for sophisticated debugging and economical system developnrtent, without 
requiring the costly cabling and access to processor pins required by traditional emulator systems or intruding on 
system resources. 

[0126] As used herein, the terms "applied," "connected," and "connection" mean electrically connected, including 
where additional elements may be in the electrical connection path. "Assodated" means a controlling relationship, 
55 such as a memory resource that is controlled by an associated port. The temris assert, assertion, de-assert, de-asser- 
tion, negate and negation are used to avoid confusion when dealing with a mixture of active high and active low signals. 
Assert and assertion are used to indk^te that a signal is rendered active, or logically true. Deassert, de-assertion, 
negate, and negation are used to indbate that a signal is rendered inactive, or logk^lly false. 
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[0127] While the invention has been described with reference to illustrative embodiments, this description is not 
intended to be construed in a limiting sense. Various other embodiments of the Invention will be apparent to persons 
skilled In the art upon reference to this description. For example, in another embodiment, the TLB may be limited to a 
single processor and not shared, or it may Include only a single level without jiTLBs, 
5 [01 28] In another embodiment, the TLB may be controlled by other means than a state machine controller, such as 
directly by an associated processor, for example. 

[0129] In another embodiment, there may be several distinct MMUs with associated TLBs, wherein certain of the 
TLBs may include aspects of the invention and certain others may not. 

[0130] It is therefore contemplated that the appended claims will cover any such modifications of the embodiments 
10 as fall within the true scope and spirit of the Invention. 



Claims 

15 1. A method of operating a digital system having a processor and associated translation lookaside buffer (TLB), 
comprising the steps of: 

executing a plurality of program tasks within the processor; 

initiating a plurality of memory access request in response to the plurality of program tasks; 
20 caching a plurality of translated memory addr^es in the TLB responsive to the plurality of nnemory access 

requests; 

incorporating a task identification value with each translated memory address to identify whbh of the plurality 
of program tasks requested the respective translated memory address; and 

locking or unlocking a portion of the plurality of translated memory address in the TLB in a manner that is 
25 qualified by the task identrfteation value, such that only an entry of a selected program task in the plurality of 

translated memory addresses is affected. 

2. The method according to Claim 1 , wherein the step of locking or unlocking comprises locking or unlocking only 
and all of the plurality of translated addresses that have the selected task identification value. 

30 

3. The method of Claim 1 or 2, wherein the TLB has several levels, and wherein the step of locking or unlocking 
encompasses all of the several levels of the TLB. 

4. The method according to any previous Claim, further comprising the step of incorporating a second qualifier value 
35 with each translated memory address; and 

wherein the step of locking or unlocking is qualified by both the task identification value and the second 
qualifier value. 

5. The method of Claim 4, wherein the digital system has a plurality of processors and wherein the second qualifier 
40 value identifies which of the plurality of processor requested the respective translated memory address. 

6. The method according to any previous Claim, further comprising the step of replacing a selected victim translated 
memory address with a different translated memory address, wherein the vbtim translated memory address is 
selected only from a portton of the plurality of translated memory addresses in the TLB that is not locked. 

45 

7. The method according to any previous Claim, further comprising the step of reserving a portion of the entry locations 
from being locked. 

8. A digital system having a translation lookaside buffer (TLB), the TLB comprising: 

50 

storage circuitry with a plurality of entry locations for holding translated values, wherein each of the plurality 
of entry locations includes a first field for a translated value and a second field for an associated qualifier value; 
a set of inputs for receiving a translation request; 

a set of outputs for providing a translated value selected from the plurality of entry locations; and 
55 control circuitry connected to the storage circuitry, wherein the control drcurtry is responsive to an operation 

command to lock or unlocked selected ones of the plurality of entry locations which have a first qualifier value 
in the second field. 
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9. The digital system of Claim 8, wherein the digital system further comprises a second level TLB connected to the 
TLB, the second level TLB comprising: 

second level storage circuitry with a plurality of entry locations for holding translated values, wherein each of 
5 the plurality of entry locations Includes a first field for a translated value and a second field for an associated 

qualifier value; and 

wherein the control circuitry is connected to the second level storage circuitry, the control circuitry being re- 
sponsive to an operation command to lock or unlock selected ones of the plurality of entry locations in the 
second storage circuitry which have a first qualifier value in the second field, such that qualified entry locations 
10 in the TLB and in the second level TLB are locked or unlocked In response to a single operation command. 

10. The digital system according to any of Claims 8-9, wherein each of the plurality of entry locations in the storage 
circuitry and the second storage circuitry contain a third field for a second associated qualifier value, and 

wherein the control circuitry is responsive to an operation command to lock or unlock selected ones of the 
15 plurality of entry locations which have both a specified task ID value In the second field and a specified resource 

ID value in the third field. 

11. The digital system according to any of Claim 8-10, wherein the control circuitry comprises: 

20 a shift register connected to the storage circuitry for selecting a next victim entry location; and 

skip circuitry connected to the shift register and to the storage circuitry, the skip circuitry operable to cause 
the shift register to skip over locked entry locations. 

12. The digital system according to any of Claim 8-10, wherein the control circuitry comprises reservation circuitry 
25 operable to reserve a portion of the entry locations from being locked. 

13. The digital system according to any of Claims 8-12 being a personal digital assistant, further comprising: 

a processor (CPU) connected to the TLB and thereby connected to access a memory circuit; 
30 a display, connected to the CPU via a display adapter; 

radio frequency (RF) circuitry connected to the CPU; and 
an aerial connected to the RF circuitry. 



18 



EP1 182 569 A1 




LU -O O 

Q. 03 



CO 



UJ 

8 



ii 



c=;> 



o 



log 

UJ o ^ 



Q- 



10 



0=0 



a. Q_ 



□t: 
a. 



19 



EP1 182 569 A1 



FIG. 2A 
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FIG, 2B 
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FIG, 3B 
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